Integrating Multiple Internet Directories by Instance-based Learning
نویسندگان
چکیده
Finding desired information on the Internet is becoming increasingly difficult. Internet directories such as Yahoo!, which organize web pages into hierarchical categories, provide one solution to this problem; however, such directories are of limited use because some bias is applied both in the collection and categorization of pages. We propose a method for integrating multiple Internet directories by instance-based learning. Our method provides the mapping of categories in order to transfer documents from one directory to another, instead of simply merging two directories into one. We present herein an effective algorithm for determining similar categories between two directories via a statistical method called the k-statistic. In order to evaluate the proposed method, we conducted experiments using two actual Internet directories, Yahoo! and Google. The results show that the proposed method achieves extensive improvements relative to both the Naive Bayes and Enhanced Naive Bayes approaches, without any text analysis on documents.
منابع مشابه
بازیابی تعاملی تصاویر طبیعت با بهره گیری از یادگیری چند نمونه ای
Content-based image retrieval (CBIR) has received considerable research interest in the recent years. The basic problem in CBIR is the semantic gap between the high-level image semantics and the low-level image features. Region-based image retrieval and learning from user interaction through relevance feedback are two main approaches to solving this problem. Recently, the research in integra...
متن کاملAn Examination of the Relationships between Internet Directories
Finding desired information on the internet is becoming increasingly difficult. Internet directories such as Yahoo! which organize web pages into hierarchical categories provides one solution to this problem, however, such directories are of limited use because some bias is applied both in the collection and categorization of the pages. Therefore, we propose a method for integrating multiple in...
متن کاملIdentifying Predictive Structures in Relational Data Using Multiple Instance Learning
This paper introduces an approach for identifying predictive structures in relational data using the multiple-instance framework. By a predictive structure, we mean a structure that can explain a given labeling of the data and can predict labels of unseen data. Multiple-instance learning has previously only been applied to flat, or propositional, data and we present a modification to the framew...
متن کاملAutomated Alignment of Multiple Internet Directories
Directory services are tools for making useful information more accessible, but individual internet directories in directory services are of limited use in nding user-relevant web pages. In this paper, we propose a method for aligning URL information from one internet directory to another. This method can discover an appropriate position in a directory for a web page which is not shown in that ...
متن کامل